Zhang et al. Theoretical Biology and Medical Modelling 201 2, 9:38 
http://www.tbiomed.eom/content/9/1/38 



THEORETICAL BIOLOGY AND 
MEDICAL MODELLING 



RESEARCH 



Open Access 



Analysis of the relationship between end-to-end 
distance and activity of single-chain antibody 
against colorectal carcinoma 



Jianhua Zhang 11 , Shanhong Uu ZT , Zhigang Shang 1 , Li Shi r and Jun Yun J 



,2t 



* Correspondence: shi_li2012@126. 
com; jun_yun2000@1 26.com 
+ Equal contributors 
Vacuity of Biomedical Engineering 
of Zhengzhou University, 
Zhengzhou 450001, Henan 
Province, People's Republic of China 
3 Vascular Endocrinology 
Department of Xijing Hospital 
Affiliated to the Fourth Military 
Medical University, Xi'an 710032, 
Shaanxi Province, People's Republic 
of China 

Full list of author information is 
available at the end of the article 



Abstract 

We investigated the relationship of End-to-end distance between VH and VL with 
different peptide linkers and the activity of single-chain antibodies by 
computer-aided simulation. First, we developed (G 4 S) n (where n = 1-9) as the linker 
to connect VH and VL, and estimated the 3D structure of single-chain Fv antibody 
(scFv) by homologous modeling. After molecular models were evaluated and 
optimized, the coordinate system of every protein was built and unified into one 
coordinate system, and End-to-end distances calculated using 3D space coordinates. 
After expression and purification of scFv-n with (G 4 S)n as n = 1, 3, 5, 7 or 9, the 
immunoreactivity of purified ND-1 scFv-n was determined by ELISA. A multi-factorial 
relationsh ip model was employed to analyz e the structural factors affecting scFv: 

r(n) = yj [AB(n) - AB 0 ] 2 + [CD(n) - CD 0 } 2 + [BC(n) - BC st } 2 . The relationship 
between immunoreactivity and r-values revealed that fusion protein structure 
approached the desired state when the r-value = 3. The immunoreactivity declined 
as the r-value increased, but when the r-value exceeded a certain threshold, it 
stabilized. We used a linear relationship to analyze structural factors affecting scFv 
immunoreactivity. 

Keywords: Single-chain Fv antibody (scFv), (Gly 4 Ser)n, End-to-end distance. 
Homologous modeling, Meta MQAP 



Introduction 

Single-chain Fv antibody (scFv) is composed of immunoglobulin heavy- and light- 
chain variable regions connected by a short peptide linker [1-3]. ScFv is an ideal tool 
for the construction of single-chain bi-specific antibody fusion proteins [4-6]. Bivalent 
antibodies derived from scFv using genetic engineering have a promising future in the 
clinic. scFvs can be therapeutic and at the same time serve as a vector for delivering a 
toxin [7]. In recent years, there has been progress in colorectal cancer diagnosis and 
treatment using scFv as a carrier. However, achieving both high affinity and anti-tumor 
activity can be difficult, particularly since both are needed to be effective. Studies have 
shown that a proper linker can provide a scFv with biological activity more effective 
for clinical applications [8-10]. Consequently, choosing and designing a proper linker 
is a key consideration. 

O© 2012 Zhang et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative 
BiolVlGCl C6ntTcll Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly cited. 
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Proteomics has revealed a great deal about the composition, structure, and function 
of proteins, and bioinformatics provides a powerful tool to study the structure- activity 
relationship of fusion proteins [11-13]. Drug design based on structural simulation 
incorporates 3D structure, including data from fusion proteins with various functional 
domains and inter-peptide linkers [14-16]. Linkers that contain (G 4 S)n are the most 
widely used [12,17], prompting us to examine its effects on the structure and function 
of scFvs. 

Materials and methods 

Materials 

IC-2 and CCL-187 cells were cultured using standard conditions. IC-2 is a murine hy- 
bridism cell line that secretes the monoclonal antibody ND-1, specific for human colo- 
rectal carcinoma. CCL-187 is a human colorectal carcinoma cell line. The pET28a (+) 
expression vector and E. coli BL21 were contributed by Prof. J. Yun, Xi'an (China). The 
pMD18-T vector, E.coli JM109 competent cells, DNA polymerase, restriction enzymes, 
and DNA recovery kits were purchased from TaKaRa Biotechnology (Shanghai, China). 
mRNA purification kits and T4 DNA ligase were purchased from Pharmacia Biotech 
(Shanghai, China). Anti-His6 tag antibody was obtained from Invitrogen (Foster City, 
CA, USA). Ni-NTA resin was provided by QIAGEN (Shanghai, China), MDP and 
99mTc were kindly provided by the Department of Nuclear Medicine of China Medical 
University (Liaoning Province, China). Heavy chain primer 1 and 2, light chain primer 
mix, linkers [(GGGGS)n] primer mix, and RS primer mix were purchased from Phar- 
macia Biotech. 

ND-1 scFv-n was constructed as previously described. Briefly, mRNA was extracted 
from 5 x 10 6 IC-2 hybridism cells and cDNA synthesized by reverse transcription using 
random primers. VH and VL genes were separately amplified from cDNA by PCR 
using a heavy and light chain primer mix. The VH and VL gene fragments were recov- 
ered and mixed in equimolar ratios for two PCR reactions, with the first one using a 
linker primer mix for 7 cycles, followed by a second one using a RS primer mix for 
30 cycles. As a result, VH and VL gene fragments were linked to form a scFv construct 
by extension, with overlapping splicing PCR. The resulting ND-1 scFv-n construct was 
cloned into pMD18-T and transformed into E. coli JM109, and positive clones identi- 
fied by colony PCR and DNA sequencing. 

Oligonucleotide primers SI and S2 were designed to add EcoRl sites at the 5'-end of 
ND-lscFv-n, and a Hindlll site, or Sail site at the 3'-end. SI: 5'-CTGAATTCATGGCC 
CAGGTGCAGCTGCAGC-3'; S2: 5'-CGCAAGCTTCTAGTCGACTTTCCAGCTTG 
GTC-3'. pMD18-T-ND-lscFv-n was used as a template, and the product cloned into 
the vector pET28a(+) after digestion with £<%>7?Iand Hindlll, and transformed into 
competent E.coli BL21 cells for protein expression. 

Amino acid sequence 

The amino acid sequence of the wild- type VH and wild-type VL are listed below [18], 
and illustrated in Figure 1. The amino acid sequence of the VH-(G 4 S)n-VL is: 

MAQVQLQQSGPGLVAPSQSLSITCTVSGFSLTTYDVHWVRQPPRKGLEWLGLVW 
ANGRTNCTSALMSRISITRDTSKNQVFLTMNSLQTDDTAMYYCARGSYGAVDFWG 



Zhang et al. Theoretical Biology and Medical Modelling 201 2, 9:38 
http://www.tbiomed.eom/content/9/1/38 



Page 3 of 1 1 




H 



M 



L 



V 



L 



Linker 



MAb 



Fab 



ScFv 



Figure 1 Map of VH-linker-VL. 



QGTTVTVSS(GGGGS)nDIELTQSPASLAVSLGQRATISYRASKSVSTSGYSYMHWQQ 
KPGQPPRLLIYLVSNLESGVPARFSGSGSGTDFTLNIHPVEEEDAATYYCQHIRELTRSE 
GGPSWK. 

Homology modeling, assessment, and optimization 

The amino acid sequence of a protein determines its high-level structure. Determining 
high-level protein structure relies on the identification of one or more known protein 
"templates" that resemble the structure of the query sequence, and alignment of the 
query sequence residues to the template residues. Swiss-Models can be used for hom- 
ology modeling to search protein sequence and structure databases, such as the Protein 
Data Bank (PDB) [19-21]. A three-dimensional model of the targeted molecule can be 
obtained through homology modeling, and used to assess and optimize the model using 
Meta MQAP [22,23]. 

Construction of coordinate system 

PDB files were obtained from Swiss-Model with the videotext coordinate system (in 
which the atomic coordinates are located), in order to facilitate protein structure com- 
parison. The coordinate systems were constructed with Matlab7.0. 

Determination of the origin of the coordinate system 

The molecular weight of the atoms in the protein was used to calculate molecular 
weight, and the centric was obtained using the atomic location of each atom. The cen- 
tric is the origin of the new coordinate system [24]. 
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N: the number of all the atoms; M = 12.01 + 14.01 + 16.00 + 32.07 + 1.00; 
Mi: molecular weight of atoms; 

[X k , Y k , ZJ: the original three-dimensional coordinates of atoms; 
[X 0 , Y 0 , Z 0 ] : the origin of the new coordinate system. 
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To determine axes we constructed a second-order moment matrix of the proteins 
atomic coordinates. This was regarded as the principal component of the matrix's eigen- 
vector of the new coordinate systems X-axis, the sub-principal component of the vector 
Y-axis, and used to build a coordinate system of the proteins three-dimensional structure. 

The 3x3 matrix constructed by the second-order moment matrix is as follows: 



M 200 


M 110 
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M 02 o 


M on 
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Here, M abc = ^ m k x X a k x Y\ x Z c ]0 

k 

m k : molecule weight of atoms. 

[X k , Y k , ZJ: 3D coordinates of each atom. 

The eigenvalues and eigenvectors of S were calculated, and the eigenvector calculated 
corresponding to the maximum eigenvalue as the first axis (X axis is set, X = [XI, X% 
X3]), with the eigenvector corresponding to the second largest eigenvalue as the second 
axis (Y axis set, Y = [Yl, Y2, Y3]), and similarly for the Z axis. 

Analysis of End-to-end distance in fusion proteins 

The End-to-end distance is the distance between the first and the last a-carbon atom 
in a protein. We obtained this information and the X/Y/Z coordinates of the atoms 
from the PDB database. The algorithm used is as follows: 

A. Locate the first and last a-carbon atoms in the wild-type VH and VL, and the 
same in the protein after introduction of (G 4 S)n. 

B. Calculate End-to-end distance of wild-type VH (VL) and mutant VH (VL) after 
introduction of (G 4 S)n. 

C. Analyze the relationship between the End-to-end distance and n. 

Biological experiments 

Expression and purification of ND-lscFv-w. 

pET28a(+)-ND-lscFv-n plasmids were constructed as expression vectors and trans- 
formed into E. coli BL21 cells, which were grown in 100 ml LB broth with 50 mg/ml 
Kanamycin at 37°C. When the culture attained an O.D. of 0.6, IPTG was added to a final 
concentration of 1 mM, and cells were shaken at 37°C. After 3.5 h, the culture was centri- 
fuged at 5,000 rpm for 10 min, and the cell pellets treated with lysis solution. After sonic- 
ation and centrifugation, inclusion bodies containing scFv proteins were solubilized and 
denatured in the presence of 6 M guanidine hydrochloride. Affinity chromatography on 
Ni-NTA resin was use to purify scFv, and the column eluted sequentially with 8 M urea at 
pH8.0, 6.5 and 4.2. The pH4.2 fraction, containing scFv, was collected and recaptured by 
dialysis. Protein purity and concentration were determined by Bradford assay. 

Western blot analysis 

ND-lscFv-w proteins were detected by western blot analysis. BL21 transformed with 
pET-28a(+)ND-lscFv-H was incubated separately in loading buffer (125 mmol/L Tris- 
HC1, pH 6.8, 10% (3-mercapto-ethanol, 4.6% SDS, 20% glycerol and 0.003% bromophe- 
nol blue) for 5 min at 100°C, separated by sodium dodecyl sulfate polyacrylamide gel 
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(SDS-PAGE), and electro blotted onto PVDF membrane (Bio-Rad, Hercules, CA, USA). 
Non-specific binding sites were blocked for 1 h with 5% nonfat milk in TPBS (PBS con- 
tained 0.05% Twin 20), and the membrane incubated overnight at 4°C with primary 
antibody. After washing 3X in TPBS, the membrane was incubated with horseradish 
peroxidase-conjugated goat anti-rabbit IgG for 2 h at room temperature, and washed 
2X with TPBS. Immunoblot signal was detected by autoradiography using an enhanced 
chemiluminescence detection kit. 

ELISA assay for activity of ND-lscFv-n 

CCL-187 cells (5 x 10 4 ) were grown in 96-well micro titer plates at 37°C for 24 h, fixed 
with 2.5% glutaraldehyde and blocked with 1% BSA, followed by incubation with ND- 
HgG or ND-lscFv at 37°C for 2 h. After washing 3X with PBS, anti-His6 antibody was 
added to wells with ND-lscFv-n and incubated. The plate was washed and HRP-labeled 
goat anti-mouse IgG was added into both ND-IgG and ND-lscFv wells. After incubating 
at 37°C for 2 h, TMB substrate was added, and samples incubated in darkness for 30 min. 
The reaction was terminated with 1 M H 2 S0 4 . PBS was used as a negative control. 

Results 

Protein structures 

A videotext of the coordinate system was built using the PDB atomic coordinates from 
PDB files received from SWISS -MODEL, using Mat lab 7.0. The maps were used for com- 
parison of the protein structures (Figure 2). Homology modeling using SWISS -MODEL 
was used to evaluate the best evaluation method. Meta-MQAP was used to assess and 
optimize the model. The accuracy score of the model and the root mean square (RMS) 
deviation are shown in Table 1. The assessment result shows that the model is reliable. 

Local alignment 

The End-to-end distance of VH (AB), VL (CD) and linker (BC), at different n values 
are presented in Table 2. It appears that linker BC was relatively stable from n = 1-7, 
and there were changes in the End-to-end distances for AB and CD. When the n value 
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Table 1 Global model accuracy was analyzed by Meta-MQAP 



Sequence 


GDT-TS 


RMSD 


VHL 


64.189 


2.392 


VHL1 


75.548 


1.959 


\/L|| T 

VMLz 




I .yzu 


VHL3 


64.81 1 


2.400 


VHL4 


75.332 


2.082 


VHL5 


77.544 


2.016 


VHL6 


75.000 


2.120 


VHL7 


76.991 


1.904 


VHL8 


73.783 


2.229 


VHL9 


60.841 


2.860 



increased within a certain range, the End-to-end distance of VH had relatively large 
fluctuations. The End-to-end distance of VL basically did not change except when n = 6 
and n - 0. The data suggests that the major factor for this was that the median value of 
BC was about 22.6622 in the End-to-end distances of linked peptides. Although the 
End-to-end distance changes were small, there were fluctuations in the value of 
AB and CD near the ideal state. Thus, the effects of the linked peptide structural 

factors (r) on VH and VL can be represented in the following equation: r(n) = 

yj [AB(n) - AB 0 } 2 + [CD(n) - CD 0 f + [BC(n) - BC st } 2 . The ideal fusion protein struc- 
ture should have a stable structure with the linker peptide of minr{n) , as shown in 
Figure 3. The results of r were obtained from the corresponding linker length. The 
r-values were 36.8161, 8.0150, 0.8415, 22.1579, 24.4747, 582.2451, 46.8344, 88.6852, 
and 112.3846, with a median value of 24.4747. 

The results suggest that when n = 3, the r-value was the smallest, and the structure of 
fusion proteins was closest to the ideal state. The r-values increased when n increased 
and hence the linker length increased, in which VH and VL structure would be 
impacted to a greater extent. When n was 6, the r value was the most unsatisfactory. 



Table 2 The value of AB\CD\BC at different values of n 



Sequence 


BC 


AB 


CD 


Wild-type VH 


36.7939 






Wild-type VL 


14.4022 






n = 0 


29.6393 


6.8006 




n = 1 


16.7822 


34.9540 


13.1215 


n = 2 


22.6653 


44.7992 


14.0089 


n = 3 


22.6622 


37.5364 


14.0063 


n = 4 


19.6630 


23.6371 


14.0095 


n = 5 


23.7909 


13.59664 


14.0052 


n = 6 


46.6356 


37.536 


21.8867 


n = 7 


27.4472 


12.8588 


14.0167 


n = 8 


13.9244 


24.4632 


14.0393 


n = 9 


12.5718 


26.2316 


14.0417 
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Figure 3 The diagram shows the relationship between the length of linked peptides and r value. 



Determination of expression and purity of proteins 

Plasmids ND-lscFv-pET28a (+) were transformed into E. coli BL21, and protein ex- 
pression induced with IPTG. Western blot analysis indicated that BL21 lysates 
expressed scFv-n proteins with bands of 30 kDa (Figure 4). The sequences encoding 
the short His-tag peptide were upstream of the multi-cloning site (MCS) of vector 
pET28a (+), and ND-lscFv-n was expressed as a recombinant fusion protein. Western 
blot analysis showed that scFv-n protein is expressed in inclusion bodies in the super- 
natant of BL21 lysates. Inclusion body protein was purified to 94% by metal affinity 
chromatography using Ni-NTA resin, which binds to the His-tag protein marker on 
the N terminal end of scFv. 

Analysis of the relationship between immunoreactivity and End-to-end distance 

The immunoreactivity of purified ND-lscFv-n was determined by ELISA. scFv-n exhi- 
bits an immunoreactivity similar to the parental ND-1 antibody, and demonstrated 
good binding to CCL-187 cells expressing colorectal carcinoma associated antigen 
LEA. This suggests that scFv-n retains good specificity and activity. 
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mmm 









Figure 4 Western blot analysis of ND-lscFv-n in BL21 cells. 1: Expression of pET28a (+)-ND-1scFv with 
(G 4 S) lf - 2: Expression of pET28a (+)-ND-1scFv with (G 4 S) 3 ; 3: Expression of pET28a (+)-ND-1scFv with (G 4 S) 5 ; 4: 
Expression of pET28a (+)-ND-1scFv with (G 4 S) 7 ; 5: Expression of pET28a (+)-ND-1scFv with (G 4 S) 9 . 
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Table 3 shows the relationship between scFv immunoreactivity (A 450 value) and r- 
values. The immunoreactivity declined with increasing r-values. It changed significantly 
when the r- value was less than 42.3716. When the r- value exceeded this value, immu- 
noreactivity became relatively stable (Figure 5). 

Discussion 

Homology modeling has been successfully applied to interpreting the correlation of 
protein sequence, structure, and function. Using a structural model, multiple sequences 
of orthologues proteins can be compared and evaluated according to the restrictions of 
natural selection and requirements of protein folding, stability, dynamics, and function. 
Homology modeling can help determine which functional groups the protein belongs 
to based on the analyses of conserved residues in the binding site. Homology modeling 
also plays an important role in computer-aided drug design [25,26]. 

One basic issue in the study of protein structure is structural comparison. The rela- 
tively direct comparison method is to consider the protein as a rigid structure com- 
posed of a series of point sets, then compare the corresponding residues of different 
proteins. At the beginning, a rigid superposing method was used (to translate and ro- 
tate the spatial structure of the protein to find the corresponding residues between two 
proteins) [27,28]. However, Chen proposed using a weight distribution of the atoms 
composing the protein, and to use this to calculate the protein s gravity center, using a 
3x3 matrix composed of second-order moments [24]. On this basis, one can use prin- 
cipal component analysis (PCA) to find the main and secondary axis. The best rigid 
superposition is obtained through superposing the gravity centers of the proteins and 
then rotating them to let their main axes superimpose. In this study, we used the mo- 
lecular weight of the atoms to get the centric according to the coordinates of each 
atom. 

It is recognized that fusion proteins have varied affinity and anti-tumor activity com- 
pared to the original molecules, due in large part to the structural alterations of the fu- 
sion proteins [4,28-31]. The inter-peptide linkers can be optimized with computer- 
aided design [32]. Based on homology modeling of derivatives [33], future designs of 
inter-peptide linkers can be viewed as solving an equation. The structure and character- 
istics of target molecules, and the composition, length, and flexibility of inter-peptide 
linker should be taken into consideration [34,35]. 

In previous studies [35-37], the length and composition of the linkers that have been 
used to link VH and VL on bivalent single-chain antibody often impact stability and 
function. Linkers may be too short to fold correctly by intermolecular static influence 
or be too long to ameliorate the immunogenicity of antibodies. To satisfy these require- 
ments, several design strategies have been developed. One approach is to use the 

Table 3 The relationship between r-value and the corresponding biology 



n r value A450 value 

3 0.8415 1.17 

5 24.4747 1 .02 

1 36.8161 0.82 

7 46.8344 0.75 

9 112.3846 0.71 
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The Relationship Between the Length of Linker and r Value and A450 Value 



#8415.1.1700) 




The n Value(Length of Linker) 

Figure 5 The linear relationship between the r value and ND-lsc(Fv) 2 -n immune reactivity. 



flexible Glycine rich sequences (G 4 S)n as tethers. Linkers comprising repeats of G 4 S have 
been used to construct bivalent single-chain antibodies targeting colorectal cancer with 
linkers of 5-15 amino acids [18,36]. With a 5 amino acid linker, immune reactivity was un- 
satisfactory, possibly because the linker was too short to provide an effective distance for 
the two antigen-binding sites, which affected the stability of the cross-linked protein. The 
linker with 15 amino acids tended to fold correctly and retained the bivalent single-chain 
antibody's affinity and capacity. It has long been noted that sufficient flexibility and length 
for VH and VL domains are achieved by assembling them in the natural Fv orientation to 
form a monovalent antigen-binding site, which is comparable to the Fab fragment of na- 
tive antibodies. It has also been shown that the length and sequence of the linker peptide 
significantly affects scFv expression and stability [36] . 

It should be pointed out that the impact of linker length on the activity and affinity 
of engineered antibodies depends strongly on the distance between the N- and C- 
terminal of the VH domain [37]. A certain degree of flexibility in the linker is required 
for the functional cooperation of the two subunits. The goal of this study was to 
characterize novel scFvs and to quantify the impact of linker peptide on binding affin- 
ity. Using computer guided homology, scFvs with different linker peptides were pro- 
posed based upon the activity and the End-to-end distance. Our aim was to evaluate 
the impact of (G 4 S)n on the structure and function of VH and VL, and to find the rela- 
tionship between VH/VLs End-to-end distance and n (or BC) on bivalent single-chain 
antibodies targeting colorectal cancer. A multi-factor relationship model was estab- 
lished to evaluate VH and VL structural factors using the following formula: r(n) = 



'[AB{n) - AB 0 } 2 + [CD(n) - CD 0 ] 2 + [BC(n) - BC st } 2 . Based on simulated data and 
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biological experiments, a linear relationship has been established between the immu- 
noreactivity and r-values. The immunoreactivity declines as the r-value increases. Fu- 
sion protein structure is ideal when the r = 3. When the n value is 6, protein structure 
is least satisfactory. However, further exploration of this relationship is needed. Indeed, 
the expression level and activity of scFv depends largely on the length and sequence of 
linker. Thus, successful construction of a scFv depends on the selection of a linker that 
neither interferes with the folding and association of VH and VL domains nor reduces 
the stability and recognition abilities of the Fv molecule. 

In summary, based on the databases of natural protein structures and their associated 
functions, we predicted the structure and function of fusion proteins by homology 
modeling and further conducted biological experiments to validate our calculations. 
Thus, a dual approach that incorporates molecular modeling and linker design of engi- 
neered antibodies with quantitative determination of antibody affinity is useful to 
optimize construction. Our approach provides not only a rationale for designing novel 
engineered antibodies using molecular modeling, but also provides new insight into 
quantifying antibody binding affinity, especially at low protein concentration. A com- 
bination of bioinformatics and genetic research may therefore be beneficial in exploring 
new agents for genetic engineering of antibodies. 
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